27 JUL 2018 by ideonexus

 Redundancy in the English Language

Whenever we communicate, rules everywhere restrict our freedom to choose the next letter and the next pineapple.I Because these rules render certain patterns more likely and certain patterns almost impossible, languages like English come well short of complete uncertainty and maximal information: the sequence “th” has already occurred 6,431 times in this book, the sequence “tk” just this once. From the perspective of the information theorist, our languages are hugely predictable— almost borin...
  1  notes

Monte Carlo method for building words and sentences.

02 JAN 2011 by ideonexus

 Graph Theory Approach to Web Topology

Perhaps the best-known paradigm for studying the Web is graph theory. The Web can be seen as a graph whose nodes are pages and whose (directed) edges are links. Because very few weblinks are random, it is clear that the edges of the graph encode much structure that is seen by designers and authors of content as important. Strongly connected parts of the webgraph correspond to what are called cybercommunities and early investigations, for example by Kumar et al, led to the discovery and mappin...
  1  notes

The graph theory approach produces a model of the web that is like a bowtie, and filled with other bowties, like a fractal. There is an image in the original document of this phenomena.